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Constructing a set of human genes with low 
sequence identity 

N. Tolstrup, K. Dalsgaard, J. Engelbrecht(*), Soren Brunak 
Abstract 

A non-redundant set of genes is needed for a wide range of sequence analysis purposes, for example, 
the estimation of the power of gene finding algorithms. Here we present a new method for 
redundancy reduction. The redundancy is quantified by similarity measures, but computing distances 
between genes is non-trivial because the available alignment methods do not in general produce 
identical results when applied to the same sequences. We extracted 550 human DNA sequences from 
GenBank, each containing at least one complete human gene with all its exons and introns. All 
sequences were aligned against each other with three different alignment methods: Global alignment 
with and without penalty for end gaps ALIGN/ALIGNO (Myers & Miller 1988), and the FASTA 
local heuristic alignment method based on hash tables (Pearson 1990). The noise level for each 
alignment was determined, and thresholds for significant alignments were found. Based on this a 
distance measure is presented that vanishes for uncorrelated sequences and is non-zero for correlated 
sequence pairs. 

By clustering human DNA sequences using this distance measure a non-redundant data set can be 
constructed. The figure below shows the number of non-redundant sequences found for clusters with 
cutoff values ranging from 4 5 to 100% sequence identif y. The choice of the cutoff identity depends 
on the type of sequence analysis one needs to do. If a small cutoff identity is used, a small set of few, 
very distant sequences is found, thereby possibly reducing the original data set more than necessary, 
and thus discarding valuable information. If a very high cutoff is chosen, very close sequence 
partners are discarded, and many sequences will still be quite similar to each other, thereby possibly 
biasing the analysis. The correct cutoff is a balance between these two extreme situations. We found 
80% to be a sensible choice. A database of human genes with identities below 80% is presented, 
representing a well defined and truly non-redundant data set. 
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Death caspase-1 



EVOLUTIONARY HOMOLOGS (part 1/2) 

In comparison with other caspase family members, Drosophila caspase-1 is more homologous to 
CPP-32 and MCH-2alpha than to ICE. It shares 3 7% sequence identity with both CPP-32 and 
MCH-2alpha, 29% identity with NEDD -2 (ICH-1), 28% homology with CED-2 and 25% 
homology with human ICE. This sequence similarity suggests that DCP-1 may be a member of 
the ceJ5-CPP-32 subfamily of caspases (Song, 1997). 

A second Drosophila Caspase has been isolated and termed drICE. drICE is distinct from Death 
Caspase-1. and exhibits highest homology with the mammalian caspases, Mch2 and CPP326. 
drICE also contains a region in its putative small subunit that corresponds to the P4-specificity 
loop of CPP326. Overexpression of drICE sensitizes Drosophila cells to apoptotic stimuli; 
expression of an N-terminally truncated form of drICE rapidly induces apoptosis in Drosophila 
cells. Induction of apoptosis by reaper overexpression or by cycloheximide or etoposide 
treatment of Drosophila cells results in proteolytic processing of drICE. drICE is a cysteine 
protease that cleaves baculovirus p35 and Drosophila lamin DmO in vitro. drICE is expressed at 
all stages of Drosophila development at which programmed cell death can be induced. Levels are 
highest from 2-6 hours of embryogenesis, lower from 6-12 hours, and still lower after 12 hours of 
development. These results strongly argue that drICE is an apoptotic caspase that acts 
downstream of reaper (Fraser, 1997). 

Properties of caspases 

The crystal structure at 2.5 A resolution of a recombinant human ICE-tetrapeptide 
chloromethylketone complex reveals that the holoenzyme is a homodimer of catalytic domains, 
each of which contains a p20 and a plO subunit. The spatial separation of the C-terminus of p20 
and the N-terminus of plO in each domain suggests two alternative pathways of assembly and 
activation in vivo. Conservation among members of the ICE/CED-3 family of the amino acids 
that form the active site region of ICE supports the hypothesis that they share functional 
similarities (Walker, 1994). 

In vitro transcribed and translated Ced-3 protein (p56) undergoes rapid processing to smaller 
fragments. Replacement of the predicted active site cysteine of Ced-3 with serine (C364S) 
prevents the generation of smaller proteolytic fragments, suggesting that the processing might be 
an autocatalytic process. Peptide aldehydes with aspartic acid at the PI position block Ced-3 
autocatalysis. Furthermore, the protease inhibition profile of Ced-3 is similar to the profile 
reported for ICE. These functional data demonstrate that Ced-3 is an Asp-dependent cysteine 
protease with substrate specificity similar to that of ICE. Aurintricarboxylic acid, an inhibitor of 
apoptosis in mammalian cells, blocks Ced-3 autocatalytic activity, suggesting that an 
aurintricarboxylic acid- sensitive Ced-3/ICE-related protease might be involved in the apoptosis 
pathway(s) in mammalian cells (Hugunin, 1996). 

The full-length CED-3 protein undergoes proteolytic activation to generate a CED-3 cysteine 
protease. CED-3 protease activity is required for killing cells by programmed cell death in C. 
elegans. In its substrate preferences CED-3 is more similar to the mammalian CPP32 protease 
than to mammalian ICE or NEDD2/ICH-1 protease. These results suggest that different 
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mammalian CED-3/ICE-like proteases may play distinct roles in mammalian apoptosis and that 
CPP32 is a candidate for being a mammalian functional equivalent of CED-3 (Xue, 1996). 

Members of the caspase family 

The C. elegans cell death gene ced-3 is most abundant during embryogenesis, the stage during 
which most programmed cell deaths occur. The predicted CED-3 protein shows similarity to 
human and murine interleukin-1 beta-converting enzyme and to the product of the mouse nedd-2 
gene, which is expressed in the embryonic brain. The sequences of 12 ced-3 mutations as well as 
the sequences of ced-3 genes from two related nematode species identify sites of potential 
functional importance. It is proposed that the CED-3 protein acts as a cysteine protease in the 
initiation of programmed cell death in C. elegans and that cysteine proteases also function in 
programmed cell death in mammals (Yuan, 1993). 

Fas/APO-1 -mediated apoptosis requires the activation of a class of cysteine proteases, including 
interleukin-1 beta-converting enzyme (ICE). Triggering of Fas/APO-1 rapidly stimulates the 
proteolytic activity of ICE. Overexpression of ICE strongly potentiates Fas/APO-1 -mediated cell 
death. Inhibition of ICE activity by protease inhibitors, as well as by transient expression of the 
pox virus-derived serpin inhibitor CrmA or an antisense ICE construct, substantially suppresses 
Fas/APO-1 -triggered cell death. It is concluded that activation of ICE or an ICE-related protease 
is a critical event in Fas/APO-1 -mediated cell death (Los, 1995). 

Nedd2 encodes a protein similar to the mammalian interleukin-1 beta-converting enzyme (ICE) 
and the product of the C. elegans cell death gene ced-3 (CED-3). Overexpression of Nedd2 in 
cultured fibroblast and neuroblastoma cells results in cell death by apoptosis, which is suppressed 
by the expression of the human bcl-2 gene, indicating that Nedd2 is functionally similar to the 
ced-3 gene in C. elegans. During embryonic development, Nedd2 is highly expressed in several 
types of mouse tissue undergoing high rates of programmed cell death, such as central nervous 
system and kidney. This work suggests that Nedd2 is an important component of the mammalian 
programmed cell death machinery (Kumar, 1994). 

The pivotal discovery that Fas-associated death domain protein (FADD) 

interleukin-1 beta-converting enzyme (FLICE)/MACH is recruited to the CD95 signaling complex 
by virtue of CD95's ability to bind the adapter molecule FADD establishes that this protease has a 
role in initiating the death pathway. A new member of the caspase family has been cloned, a 
homologue of FLICE/MACH, and Mch4. Since the overall architecture and function of this 
molecule is similar to that of FLICE, it has been designated FLICE2. Importantly, the 
carboxyl-terminal half of the small catalytic subunit that includes amino acids predicted to be 
involved in substrate binding is distinct. The pro-domain of FLICE2 encodes a functional death 
effector domain that binds to the corresponding domain in the adapter molecule FADD. 
Consistent with this finding, FLICE2 is recruited to both the CD95 and p55 tumor necrosis factor 
receptor signaling complexes in a FADD-dependent manner. A functional role for FLICE2 is 
suggested by the finding that an active site mutant of FLICE2 inhibits CD95 and tumor necrosis 
factor receptor-mediated apoptosis. FLICE2 is therefore involved in CD95 andp55 signal 
transduction (Vincenz, 1997). 

A novel apoptotic gene has been cloned from human Jurkat T-lymphocytes. The 32-kDa putative 
cysteine protease (CPP32) has significant homology to C. elegans cell death protein Ced-3, 
mammalian interleukin-1 beta-converting enzyme (ICE), and the product of the mouse nedd2 
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gene. The CPP32 transcript is highly expressed and most abundant in cell lines of lymphocytic 
origin. Overexpression of CPP32 or ICE in Sf9 insect cells results in apoptosis. In addition, 
coexpression of recombinant p20 and pi 1 derived from the parental full-length CPP32 sequence 
results in apoptosis in Sf9 cells. Similar to ICE, CPP32 is made of two subunits, p20 and pi 1, 
which form the active CPP32 complex. The apoptotic activity of CPP32 and its high expression 
in lymphocytes suggest that CPP32 is an important mediator of apoptosis in the immune system 
(Fernandes-Alnemri, 1994). 

The enzyme apopain is the protease responsible for the cleavage of poly(ADP-ribose) 
polymerase, and is necessary for apoptosis. It is composed of two subunits of relative molecular 
mass (M(r)) 17K and 12K, derived from a common proenzyme identified as CPP32. This 
proenzyme is related to interleukin-1 beta-converting enzyme (ICE) and CED-3, the product of a 
gene required for programmed cell death in C. elegans. A potent peptide aldehyde inhibitor has 
been developed and shown to prevent apoptotic events in vitro, suggesting that apopain/CPP32 is 
important for the initiation of apoptotic cell death (Nicholson, 1995). 

The cell surface receptor Fas (Apo-l/CD95) belongs to the tumor necrosis factor/nerve growth 
factor receptor family; it transmits apoptotic signals by binding to its ligand. 
Interleukin-lbeta-converting enzyme (ICE), which shows substantial homology to the product of 
ced-3, the cell death gene of C. elegans, is reported to be involved in Fas-mediated apoptosis. 
Using two human carcinoma-derived cell lines with undetectable levels of ICE, it was found that 
an agonistic antihuman Fas antibody induces the activation of CPP32/Y ama(-like) proteases that 
are ICE(-like) protease family members. A tetrapeptide inhibitor of CPP32/Yama protease, 
DEVD-CHO, inhibits the Fas-mediated activation of the proteases, Fas-mediated apoptosis, and 
CPP32/Yama(-like) proteolytic activities in vitro. Fas-mediated apoptosis is inhibited by the 
CPP32/Yama inhibitor DEVD-CHO, but not by the ICE inhibitor YVAD-CHO, suggesting a 
dominant role for the CPP32/Yama(-like) proteases and not ICE itself in Fas-mediated apoptosis 
of the human carcinoma cell lines (Hasegawa, 1996). 

Upstream regulators of the cell death hierarchy 

continued: see Caspase-1: Evolutionary Homologs part 2/2 
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The band 4.1 superfamily that includes 81kD ezrin has expanded to include 
number of new members. An F-actin barbed end capping protein of 82kD from hepat 
adherens junctions, called radixin, has 75% protein sequence identity to ezrin (1, 
human heparin-binding protein of 77kD, named moesin, has a protein sequence 74% 
identical to p81 ezrin (3) and 81% identica l to radixin (2) . Moesin was found to 
identical to a human placental p77 protein described earlier (4) . All these prote 
show about 30% sequence identit y with the amino-terminal -260 residues of band 4.1 
Other members of this superfamily include talin (5) , the product of the 
neurofibromatosis gene, merlin/schwannomin (6, 7) , and two tyrosine protein 
phosphatases (8, 9). 

Ezrin is known to be a phosphoprotein . In A431 cells tyrosine phosphorylat 
correlates with the EGF-dependent induction of surface structures that contain act 
ezrin (4) . The sites of tyrosine phosphorylation on ezrin have been mapped (10) . I 
recently been shown to become phosphorylated on ser/thr in gastric parietal cells 
induced to translocate the proton pump to the apical membrane by elevation of cAMP 
with the coincident appearance of microvilli that contain ezrin (11, 12) . Ezrin i 
relatively specific substrate for calpain I cleavage in parietal cells (13) . Ezrin 
substrate for a tyrosine kinase (s) in T-cells (14). Transfection experiments indi 
that ezrin has separable domains that associate with the membrane or the cytoskele 

(15) . A previously identified tumor transplantation antigen has been identified as 

(16) . 

The distinct contributions of ezrin, radixin and moesin to cellular struct 
not yet been elucidated. Ezrin and moesin appear to show a very similar localizat 
cultured cells (17) , with both proteins being enriched in surface structures that 
contain an actin cytoskeleton. Radixin seems to be enriched in adherens junctions 
focal contacts of cultured cells (1,2); whether it is also present in cell surface 
structures containing ezrin and moesin is not yet clear (17) . Ezrin and moesin ha 
strikingly different tissue distributions: ezrin is enriched in many epithelial ce 
where it is generally restricted to the microvilli on the apical aspect of polariz 
whereas moesin is most abundant in endothelial cells (18) . Given the very differe 
tissue distributions of ezrin and moesin, it is remarkable that both are generally 
not invariably, found in cultured cells (2,17) . 

Recent results with cultured cells have revealed that ezrin and moesin can 
exist as momomers, dimers or heterodimers in vivo or in vitro (19) . The homotypic 
heterotypic association domains are located in the C-terminal halves of the molecu 
(20) . 

A summary that includes a discussion of the ezrin, radixin, moesin (ERM) f 
of proteins has appeared (21) . 
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